Training Phrase-Based Machine Translation Models on the Cloud: Open Source Machine Translation Toolkit Chaski

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Training Phrase-Based Machine Translation Models on the CloudOpen Source Machine Translation Toolkit Chaski

In this paper we present an opensource machine translation toolkit Chaski which is capable of training phrase-based machine translation models on Hadoop clusters. The toolkit provides a full training pipeline including distributed word alignment, word clustering and phrase extraction. The toolkit also provides an extended error-tolerance mechanism over standardHadoop error-tolerance framework. ...

متن کامل

NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation

We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers...

متن کامل

Joshua: An Open Source Toolkit for Parsing-Based Machine Translation

We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, ngram language model integration, beamand cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed c...

متن کامل

SampleRank Training for Phrase-Based Machine Translation

Statistical machine translation systems are normally optimised for a chosen gain function (metric) by using MERT to find the best model weights. This algorithm suffers from stability problems and cannot scale beyond 20-30 features. We present an alternative algorithm for discriminative training of phrasebasedMT systems, SampleRank, which scales to hundreds of features, equals or beats MERT on b...

متن کامل

Continuous Space Translation Models for Phrase-Based Statistical Machine Translation

This paper presents a new approach to perform the estimation of the translation model probabilities of a phrase-based statistical machine translation system. We use neural networks to directly learn the translation probability of phrase pairs using continuous representations. The system can be easily trained on the same data used to build standard phrase-based systems. We provide experimental e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Prague Bulletin of Mathematical Linguistics

سال: 2010

ISSN: 1804-0462,0032-6585

DOI: 10.2478/v10108-010-0004-8